Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: stabilize cpe sorting during collection sort #3009

Merged
merged 2 commits into from
Jul 9, 2024
Merged

Conversation

spiffcs
Copy link
Contributor

@spiffcs spiffcs commented Jul 3, 2024

Summary

This is not yet fully deterministic but is a branch that attempts to fix #2967

Reproduce

FROM alpine:latest

RUN apk update

RUN apk add jenkins

Build the above image and run syft using this branch

for i in (seq 1 20); ~/go/bin/syft -o json 2967:test | jq . > test_sbom/test_$i.json; end

Then you can use something like the following to show the sha for each sbom

#!/opt/homebrew/bin/fish

# Change directory to the target directory
cd test_sbom

# Loop through each file in the directory
for file in *
    if test -f $file  # Check if the item is a regular file
        set sha (sha256sum $file)
        echo "$sha\t$file"
    end
end

On main this produces about 7/8 unique sha for 20 sbom generated against the same image.
On this branch there are only 2 unique sha for the same test.

This is currently WIP to determine what the final non determinism that is causing the two different sha.

Current determinism progress between main and this branch for the sample image

The final diff for this branch no longer is related to CPE inconsistency and instead exists within how java packages are created and stored in the collection during cataloging. There is also an inconsistency in relationships being generated for the two outputs of 4d076f84e08b338743ee76f68f2ba516267d954dece28ff700c6ebc152ca95a4 and 5d5655ad32cdbfbcdaaabe6a314f6656145ad33dcb54428802fd3dc153e436d0.

This is what I'm currently investigating before submitting this PR for review.

Main

7275bc491df371ae43338df29d8bcc91277a949c00dc5aa0b017009ac5442384  test_1.json\ttest_1.json
7275bc491df371ae43338df29d8bcc91277a949c00dc5aa0b017009ac5442384  test_2.json\ttest_2.json
7275bc491df371ae43338df29d8bcc91277a949c00dc5aa0b017009ac5442384  test_3.json\ttest_3.json
7275bc491df371ae43338df29d8bcc91277a949c00dc5aa0b017009ac5442384  test_4.json\ttest_4.json
7275bc491df371ae43338df29d8bcc91277a949c00dc5aa0b017009ac5442384  test_5.json\ttest_5.json
825e8612d1bb070e42451718cab9f69e0439be3228880d697d140f19340ff55a  test_6.json\ttest_6.json
711c903cf104e84ed89b1057f634a4cc594edee6c2007eeb3e489ebb089daf0d  test_7.json\ttest_7.json
7275bc491df371ae43338df29d8bcc91277a949c00dc5aa0b017009ac5442384  test_8.json\ttest_8.json
521a382fb033df30649eb9204519e4bb66d376b353ae11c4a492f2245fb00663  test_9.json\ttest_9.json
2dc6d6f47a53e12c384ea37a6d395461f19ee1a016e8e97b7137330206293406  test_10.json\ttest_10.json
9f86203a5511a109d97f9972fcd79b07ff77b3331058b6016ba5fa1b9ace3b48  test_11.json\ttest_11.json
7275bc491df371ae43338df29d8bcc91277a949c00dc5aa0b017009ac5442384  test_12.json\ttest_12.json
fd0c866c23b306b8dfdd9e64e1f16808037b3c6c9a1425fd45e5411362340bde  test_13.json\ttest_13.json
6257fd4022d7589cdf981a52b25a2e3e532810548358c09fc8e6ca10f43e9a78  test_14.json\ttest_14.json
7275bc491df371ae43338df29d8bcc91277a949c00dc5aa0b017009ac5442384  test_15.json\ttest_15.json
7275bc491df371ae43338df29d8bcc91277a949c00dc5aa0b017009ac5442384  test_16.json\ttest_16.json
297748476a36259f1ff14f6087333823dca77c4f13ce33198acaabbafde254f6  test_17.json\ttest_17.json
9cb06d031e5b6b95279ef0c6a5faf4380acfcc117fb6699ab9c7ea5df7e6095b  test_18.json\ttest_18.json
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855  test_19.json\ttest_19.json

Branch

[I] hal@ChristophersMBP ~/d/s/2967> ./sha.fish
4d076f84e08b338743ee76f68f2ba516267d954dece28ff700c6ebc152ca95a4  test_1.json\ttest_1.json
4d076f84e08b338743ee76f68f2ba516267d954dece28ff700c6ebc152ca95a4  test_2.json\ttest_2.json
5d5655ad32cdbfbcdaaabe6a314f6656145ad33dcb54428802fd3dc153e436d0  test_3.json\ttest_3.json
4d076f84e08b338743ee76f68f2ba516267d954dece28ff700c6ebc152ca95a4  test_4.json\ttest_4.json
4d076f84e08b338743ee76f68f2ba516267d954dece28ff700c6ebc152ca95a4  test_5.json\ttest_5.json
4d076f84e08b338743ee76f68f2ba516267d954dece28ff700c6ebc152ca95a4  test_6.json\ttest_6.json
5d5655ad32cdbfbcdaaabe6a314f6656145ad33dcb54428802fd3dc153e436d0  test_7.json\ttest_7.json
4d076f84e08b338743ee76f68f2ba516267d954dece28ff700c6ebc152ca95a4  test_8.json\ttest_8.json
4d076f84e08b338743ee76f68f2ba516267d954dece28ff700c6ebc152ca95a4  test_9.json\ttest_9.json
4d076f84e08b338743ee76f68f2ba516267d954dece28ff700c6ebc152ca95a4  test_10.json\ttest_10.json
5d5655ad32cdbfbcdaaabe6a314f6656145ad33dcb54428802fd3dc153e436d0  test_11.json\ttest_11.json
4d076f84e08b338743ee76f68f2ba516267d954dece28ff700c6ebc152ca95a4  test_12.json\ttest_12.json
4d076f84e08b338743ee76f68f2ba516267d954dece28ff700c6ebc152ca95a4  test_13.json\ttest_13.json
4d076f84e08b338743ee76f68f2ba516267d954dece28ff700c6ebc152ca95a4  test_14.json\ttest_14.json
4d076f84e08b338743ee76f68f2ba516267d954dece28ff700c6ebc152ca95a4  test_15.json\ttest_15.json
4d076f84e08b338743ee76f68f2ba516267d954dece28ff700c6ebc152ca95a4  test_16.json\ttest_16.json
4d076f84e08b338743ee76f68f2ba516267d954dece28ff700c6ebc152ca95a4  test_17.json\ttest_17.json
5d5655ad32cdbfbcdaaabe6a314f6656145ad33dcb54428802fd3dc153e436d0  test_18.json\ttest_18.json
4d076f84e08b338743ee76f68f2ba516267d954dece28ff700c6ebc152ca95a4  test_19.json\ttest_19.json
4d076f84e08b338743ee76f68f2ba516267d954dece28ff700c6ebc152ca95a4  test_20.json\ttest_20.json

Signed-off-by: Christopher Phillips <32073428+spiffcs@users.noreply.github.com>
@spiffcs
Copy link
Contributor Author

spiffcs commented Jul 3, 2024

The final bit of nondeterminism here comes from the java cataloger and how packages are deduplicated in the given sample image. I believe on some runs the final jansi package is sourced from one of the following virtual path.

Depending on which "wins" vs the other and makes it into the final SBOM causes other side affects around relationships, cpes, and other metadata regarding the package.

This causes the sha of the generated SBOM to be different depending on which "jansi" made it into the final document.

-        virtualPath: "/usr/share/webapps/jenkins/jenkins.war:WEB-INF/lib/jansi-1.11.jar:org.fusesource.jansi:jansi"
+        virtualPath: "/usr/share/webapps/jenkins/jenkins.war:WEB-INF/lib/jansi-1.11.jar"

In

@spiffcs spiffcs marked this pull request as ready for review July 9, 2024 14:08
syft/pkg/collection.go Outdated Show resolved Hide resolved
Signed-off-by: Christopher Phillips <32073428+spiffcs@users.noreply.github.com>
@spiffcs
Copy link
Contributor Author

spiffcs commented Jul 9, 2024

@wagoodman updated based on your comments

@spiffcs spiffcs merged commit f7ffcc5 into main Jul 9, 2024
11 checks passed
@spiffcs spiffcs deleted the cpe-sorting branch July 9, 2024 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Order CPEs deterministically for SBOM reproducibility
2 participants